Technical Q&A TX06
Converting Simplified Chinese

Q How do I convert Macintosh Simplified Chinese encoding to the relevant GB standard?

A The Mac encoding for Simplified Chinese is a shifted GB2312. To convert from GB2312 to Mac encoding, just add 0x8080 to each character. To convert from the Mac encoding to GB2312, subtract 0x8080 from each character. For example, an IDEOGRAPHIC COMMA (unicode code point 0x3001) is 0x2122 in GB, and 0xA1A2 on the Mac.

The only sub-ranges of characters you need to worry about are the Roman characters. Below is some code that illustrates how to do the conversion.

// returns true if the character needed conversion, or false if it was a
// single byte character (meaning that only the first byte was processed)
// (i.e. a false return means the character was a Roman character)
boolean MacToGB2312(unsigned char first, unsigned char second,
    unsigned short *output)
{
    if (first < 0x81) {
        *output = first;
        return false;
    } else {
        unsigned short temp;
        temp = (first - 0x80) << 8;
        temp += (second - 0x80);
        *output = temp;
        return true;
    }
}

// this will always convert, so we don't need to get the bytes separately
// nor do we need to return a boolean saying whether we converted
void GB2312ToMac(unsigned short input, unsigned short *output)
{
    *output = input + 0x8080;
}

As you can see from the code, you need to shift both bytes of a two-byte character. This is done so that it is obvious whether a character is part of a two-byte character, or is a single-byte Roman character.

For Further Information

  • Understanding Japanese Information Processing, by Ken Lunde, published by O'Reilly & Associates.

[Feb 09 1996]


Developer Documentation | Technical Notes | Development Kits | Sample Code